Picture for Yerui Sun

Yerui Sun

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training

Add code
May 26, 2026
Viaarxiv icon

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control

Add code
Apr 21, 2026
Viaarxiv icon

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

Add code
Apr 15, 2026
Viaarxiv icon

AsyncTLS: Efficient Generative LLM Inference with Asynchronous Two-level Sparse Attention

Add code
Apr 09, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

Scaling Embeddings Outperforms Scaling Experts in Language Models

Add code
Jan 29, 2026
Viaarxiv icon

LongCat-Flash-Thinking-2601 Technical Report

Add code
Jan 23, 2026
Viaarxiv icon

Efficient Context Scaling with LongCat ZigZag Attention

Add code
Dec 30, 2025
Viaarxiv icon

AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

Add code
Dec 27, 2025
Viaarxiv icon

Accelerate Speculative Decoding with Sparse Computation in Verification

Add code
Dec 26, 2025
Viaarxiv icon